The woylier package implements alternative method for interpolation path between tour frames using Givens rotation.
The idea of projection pursuit is a procedure used to locate the projection of high-to-low dimensional space that expose the most interesting feature of data originally proposed by Kruskal(Kruskal 1969). The projection pursuit technique involves a criterion of interest, numerical objective function, and the most interesting projection of data is achieved when the criterion is maximized. In the literature, there are a number of such criterion has been developed based on clustering, spread, and outliers.
Grand tour is a multivariate data visualization technique developed by Asimov(Asimov 1985), which is based on idea of rotations of a lower dimensional projection in high-dimensional space. This technique is equivalent to rotating an object in 3D to better understand its shape and dimensions. Originally, Asimov’s grand tour presents the viewer with an automatic movie of projections with no user control. Since then the literature on the tour has been about interactivity of the tour giving the control to users.(Buja et al. 2005) One of the interactive visual data exploration tool is “guided tour”.
The guided tour gives user an extrapolation power by combining projection pursuit with tour which is implemented in the “tourr”(Wickham et al. 2011) package. Current implementation of guided tour used geodesic interpolation between planes.
The interpolated paths based on geodesic interpolation between bases is visually invariant under changes of orientation. However, in some cases of non-linear projection pursuit the orientation of frames does matter. One example is splines2D index. This rotational variety issue of non-linear projection pursuit functions was the motivation of this work.
A few alternatives to geodesic interpolation were proposed by Buja et al(Buja et al. 2005). The purpose of woylier package is to implement Givens paths method. This algorithms adapts Given’s matrix decomposition technique that zeros out all but first elements of a vector.
This article is structured as follows. The next section provided the theoretical framework of Givens interpolation method followed by a section about the implementation Givens path in woylier package that is compatible with current geodesic_path() function of tourr package. Furthermore, we would attempt to apply this interpolation method to projection pursuit of splines index to search for nonlinear associations between variables in example data set. Finally, this article includes a discussion about the limitations.
High-dimensional data visualisation
A projection is a tool of visualization of high dimensional data onto lower dimensions.(Buja et al. 2005) When projecting higher dimensional data onto lower dimensions, one might care about orientation of the projection in such cases projections need to be onto frames rather than planes where orientation does not matter.
The visualization of higher than 3 dimension is based on rotations of of a lower dimensional projection in high-dimensional space. Animation of these projections are one-parameter (time) family of pictures.
This paper explains algorithms for dynamic projections such as grand tours, guided tours, and manual tours.
While we can imagine rotation of 3D object, the generalization of rotation in higher than 3 dimension is quite complex. Notion of grand tour was introduced by Asimov (1985). Grand tour shows 2-D projection of higher dimensional space with no user control. Grand tour is space-filling curve in the manifold of low-dimensional projections of high-dimensional data space. Authors of this paper further explores interactivity of tours which resulted “guided tours” and “manual tours”.
The topic of this paper is the construction of paths of projections. Interpolation of paths of projection can be compared to connecting line segments that interpolate points in Euclidean space. Interpolation acts as a bridge between continuous animation and discrete choice of sequences of projections. Sequence of projections can be constructed in various ways depending on user purpose. If user wants to look at the data from all sides, a random sequence of projections can be used, which is implemented in grand tours. Furthermore, the sequence of projections can be pre-computed, data-driven, or even manually controlled.
Projection pursuit is a technique for finding data projections that are most structured according to a criterion of interest such as clustering or spread.
Buja et al(Buja et al. 2005) prefers interpolation method over original “torus method” used in Asimov (1985) for projection algorithms for several reasons. One reason is that the projection paths based on torus method can be non-uniformly distributed while interpolation method is uniformly distributed by construction. Another pitfall of torus method is it causes discontinuity when user has need to change set of variables that are being viewed.
We are aiming to provide alternative interpolation method that is compatible with current geodesic_path() function of tourr package. Then we would attempt to apply this interpolation method to projection pursuit of splines index to search for nonlinear associations between variables in financial data set. Finally, we would provide some example use of the package.
Givens rotation(Golub and Loan 1989) algorithm
Tour
https://en.wikipedia.org/wiki/Grand_Tour_(data_visualisation)
Rotation and projection
Planar rotation
A rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space in xy plane. A rotation matrix that transforms 2-D plane by an angle \(\theta\) looks like this:
\[ \begin{bmatrix}\cos \theta &-\sin \theta \\\sin \theta &\cos \theta \end{bmatrix} \]
If the rotation is in the plane of variables i and j, it is called Givens rotation.
The interpolation methods in this project are based on the composition of a number of Givens rotations that maps starting frame onto the target frame.
\[ W_z = R_m(\tau_m) ... R_2(\tau_2)R_1(\tau_1)W_a\]
Interpolation in tour
Interpolating path of Frames
Frame interpolation is necessary when the orientation of the projection matters. There are several methods discussed in the paper including decomposition of orthogonal matrices, givens decomposition and householder decomposition. One that is of interest to us is Givens path.
The usage of Givens rotations comes from the fact that in any vector u one can zero out the i’th coordinate with a Givens rotation in the (i; j)-plane for any j $ $ i. This rotation affects only coordinates i and j andleaves all coordinates k \(\neq\) i; j unchanged.
Sequences of Givens rotations can map any orthonormal d-frame F in p-space to standard d-frame \(E_d=((1, 0, 0, ...)^T, (0, 1, 0, ...)^T, ...)\).
The path construction algorithm work as follows:
\[B = (F_a, F_{\star})\].
\[ W_a = R_m(\tau_m) ... R_2(\tau_2)R_1(\tau_1)W_z\] The inverse mapping is obtained by reversing the sequence of rotations with the negative of the angles:
\[R(\tau) = R_1(-\tau_1) ... R_m(-\tau_m), \ W_z = R(\tau)W_a\] ## Limitations
Buja et al. (2004) discussed when the orientation of projection matters. If the rendering on a frame and on the rotated version of the frame yields the same visual scenes, it means the orientation does not matter.
When d=1, there will be only one dimensional projection visualized horizontally or vertically. If the projection was interpretable, the projections of left-to-right and right-to-left would be different. But in our case d=1, orientation is irrelevant because it is just linear combination of variables.
When d=2, we usually plot Cartesian scatterplot. If we consider reflected or rotated scatterplots, typical structures such as clusters, lines, curves, and outliers are recognizable without rotations. Therefore, orientation does not matter.
Interactive data graphics provides plots that allow users to interact them. One of the most basic types of interaction is through tooltips, where users are provided additional information about elements in the plot by moving the cursor over the plot.
This paper will first review some R packages on interactive graphics and their tooltip implementations. A new package ToOoOlTiPs that provides customized tooltips for plot, is introduced. Some example plots will then be given to showcase how these tooltips help users to better read the graphics.
Some packages on interactive graphics include plotly (Sievert 2020) that interfaces with Javascript for web-based interactive graphics, crosstalk (Cheng and Sievert 2021) that specializes cross-linking elements across individual graphics. The recent R Journal paper tsibbletalk (Wang and Cook 2021) provides a good example of including interactive graphics into an article for the journal. It has both a set of linked plots, and also an animated gif example, illustrating linking between time series plots and feature summaries.
ToOoOlTiPs is a packages for customizing tooltips in interactive graphics, it features these possibilities.
The palmerpenguins data (Horst et al. 2020) features three penguin species which has a lovely illustration by Alison Horst in Figure 1.
Figure 1: Artwork by @allison_horst
Table 1 prints at the first few rows of the penguins data:
| species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
|---|---|---|---|---|---|---|---|
| Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | male | 2007 |
| Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | female | 2007 |
| Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | female | 2007 |
| Adelie | Torgersen | NA | NA | NA | NA | NA | 2007 |
| Adelie | Torgersen | 36.7 | 19.3 | 193 | 3450 | female | 2007 |
| Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 | male | 2007 |
Figure 2 shows an interactive plot of the penguins data, made using the plotly package.
p <- penguins %>%
ggplot(aes(x = bill_depth_mm, y = bill_length_mm,
color = species)) +
geom_point()
ggplotly(p)
Figure 2: A basic interactive plot made with the plotly package on palmer penguin data. Three species of penguins are plotted with bill depth on the x-axis and bill length on the y-axis. When hovering on a point, a tooltip will show the exact value of the bill depth and length for that point, along with the species name.
We have displayed various tooltips that are available in the package ToOoOlTiPs.
ToOoOlTiPs, plotly, crosstalk, tsibbletalk, palmerpenguins, ggplot2
Spatial, TeachingStatistics, TimeSeries, WebTechnologies
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Batsaikhan, et al., "Woylier: Alternative tour frame interpolation method", The R Journal, 2022
BibTeX citation
@article{woylier_article,
author = {Batsaikhan, Zoljargal and Cook, Dianne and Laa, Ursula},
title = {Woylier: Alternative tour frame interpolation method},
journal = {The R Journal},
year = {2022},
note = {https://doi.org/10.32614/woylier_article},
doi = {10.32614/woylier_article},
issn = {2073-4859},
pages = {1}
}